Performance Measurements of the NERSC Cray Cascade System

نویسندگان

  • Brian Austin
  • Matthew J. Cordery
  • Harvey J. Wasserman
  • Nicholas J. Wright
چکیده

Cray began delivery of their next generation XC30 supercomputer systems in late 2012. One of the first systems, “Edison,” was delivered to NERSC and in this paper we present preliminary performance results obtained on this machine. The primary new feature of the XC30 architecture is the Cray “Aries” interconnect that includes a 48-port high radix router with a dragonfly topology. To demonstrate the Aries’ substantial improvements in bandwidth, latency, message rate, and scalability, we present measurements of the basic performance characteristics of the system and examine the scalability of several network-centric “microbenchmarks.” Although some low-level microbenchmark results for Aries have been published previously (using prototype hardware), the unique contribution of this work consists of performance results for the NERSC Sustained System Performance (SSP) application benchmarks. The SSP benchmarks span a wide range of science domains, algorithms and implementation choices, and provide a more holistic performance metric. We examine the performance and scalability of these benchmarks on the XC30 and compare performance with other state-of-the-art HPC platforms. Edison nodes are composed of two eight-core Intel "Sandy Bridge" processors, which provide single-node performance to complement the networking improvements afforded by the Aries interconnect. Counting two hyperthreads per core, Edison has 32 hardware threads per node; thus, multi-threading is essential for obtaining optimal performance. We report the OpenMP, core-specialization and hyperthreading settings that maximize

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MPI-I/O on Franklin XT4 System at NERSC

Prior to a software upgrade and hardware maintenance on March 17th 2009 on the Frankin Cray XT4 machine at the National Energy Research Scientific Computing (NERSC) Center, MPI-IO shared file performance saw only a small percentage of file-per-processor performance POSIX performance. The March 17th upgrade unintentionally increased I/O performance significantly for a number of applications. Thi...

متن کامل

Estimating the Performance Impact of the MCDRAM on KNL Using Dual-Socket Ivy Bridge Nodes on Cray XC30

NERSC is preparing for its next petascale system, named Cori, a Cray XC system based on the Intel KNL MIC architecture. Each Cori node will have 72 cores (288 threads), 512 bit vector units, and a low capacity (16GB) and high bandwidth (~5x DDR4) on-package memory (MCDRAM or HBM). To help applications get ready for Cori, NERSC has developed optimization strategies that focus on the MPI+OpenMP p...

متن کامل

GPFS on a Cray XT

The NERSC Global File System (NGF) is a center-wide production file system at NERSC based on IBM’s GPFS. In this paper we will give an overview of GPFS and the NGF architecture. This will include a comparison of features and capabilities between GPFS and Lustre. We will discuss integrating GPFS with a Cray XT system. This configuration relies heavily on Cray DVS. We will describe DVS and discus...

متن کامل

Comparing Compiler and Library Performance in Material Science Applications on Edison

Materials science and chemistry applications are expected to represent approximately one third of the computational workload on NERSC’s Cray XC30 system, Edison. The performance of these applications can often depend sensitively on the compiler and compiler options used at build-time. For this reason, the NERSC user services group supplies users with optimized builds of the most commonly used m...

متن کامل

Tuning HDF5 subfiling performance on parallel file systems

Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013